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Abstract. The GLV method of Gallant, Lambert and Vanstone (CRYPTO 2001) computes any mul- 
tiple kP of a point P of prime order n lying on an elliptic curve with a low-degree endomorphism $ 
(called GLV curve) over F p as 

kP = fciP + k 2 <P(P), with max{|fci|, |fc 2 |} < C\\fn 

for some explicit constant C\ > 0. Recently, Galbraith, Lin and Scott (EUROCRYPT 2009) extended 
this method to all curves over F p 2 which are twists of curves defined over ¥ p . We show in this work how 
to merge the two approaches in order to get, for twists of any GLV curve over F p 2, a four-dimensional 
decomposition together with fast endomorphisms <P, over F 2 acting on the group generated by a 
on : point P of prime order n, resulting in a proved decomposition for any scalar k 6 [l,n] 

■ for some explicit C2 > 0. Furthermore, taking the best Ci,C2, we get C2/C1 < 408, independently of 

the curve, ensuring a constant relative speedup. 

We also derive new families of GLV curves, corresponding to those curves with degree 3 endomorphisms. 
Keywords. Elliptic curves, GLV method, Scalar Multiplication. 

X. 

^ . 1 Introduction 

The Gallant-Lambert-Vanstone (GLV) method is a generic method to speed up computation on 
some elliptic curves over fields of large characteristic. Given a curve with a point P of prime order n, 
it consists essentially in an algorithm to find a decomposition of an arbitrary scalar multiplication 
kP for k £ [L n ] into two scalar multiplications with the new scalars having only about half 
the original bits. We call such a method two-dimensional, since if scalar multiplications can be 
parallelized, then a twofold performance speedup can be achieved. 

Whereas the original GLV method as defined in [3] works on curves over ¥ p with an endomor- 
phism of small degree (GLV curves), Galbraith-Lin-Scott (GLS) in [2] have shown that over F p 2 
one can expect to find many more such curves by basically exploiting the action of the Frobenius 
endomorphism. One can therefore expect that on the particular GLV curves, this new insight will 
lead to improvements over ¥ p 2. Indeed the GLS article itself considers fourfold speedups on GLV 
curves with nontrivial automorphisms (corresponding to the degree one cases) but leaves the other 
cases open to investigation. 
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Recently a paper by Zhou, Hu, Xu and Song [12] has shown that it is possible to combine the 
two approaches by introducing a three-dimensional version of the GLV method (thus getting three 
scalars with a threefold speedup), which seems to be working to a certain degree, with however no 
justification but through practical implementations. 

In contrast, we would like to show that the most natural understanding of their ideas is in four 
dimensions, where we are then able to construct, for the same curves and fast endomorphisms <P, \P 
over ¥ p 2 acting on a cyclic group generated by a point P of prime order n, a proved decomposition 
for any scalar k € [l,n] 

kP = fcjP + k 2 $(P) + k 3 &(P) + k 4 &<P(P) with maxflfcil) < Cn 1 ^ 

i 

for some explicitly computable C . If parallel computation is available, then the computation of 
kP can possibly be implemented up to four times as fast as a traditional scalar multiplication. It 
recently came to our attention that Hu, Longa and Xu [4] also provided a similar bound in the case 
of curves with ^-invariant 0. Our analysis supplements theirs by considering all GLV curves, where 
we provide an unified treatment. 

The way to prove this bound is to study the kernel lattice of the GLV reduction map in dimension 
four. The LLL algorithm [5] then will find a suitable reduced basis together with a useful bound 
to deduce C . In the last part of the article, we develop another approach which gives a reduced 
basis faster than the LLL algorithm together with a much better value for C. Indeed our reduction 
algorithm runs in 0(log 2 n) compared to 0(log 3 n) for LLL and the improved C = 0{y/s) compared 
to the value obtained with LLL which is only J?(s 3//2 ). This allows us to prove that the relative 
speedup in going from a two-dimensional to a four-dimensional GLV method is independent of the 
curve. 

2 The GLV Method 

In this section we briefly summarize the GLV method following [9]. Let E be an elliptic curve 
defined over a finite field ¥ q and P be a point of this curve with prime order n such that the 
cofactor h = #E(F q )/n is small, say h < 4. Let us consider <P a non trivial endomorphism defined 
over F q and X 2 + rX + s its characteristic polynomial. In all the examples r and s are actually 
small fixed integers and q is varying in some family. By hypothesis there is only one subgroup of 
order n in E(¥ q ), implying that &(P) = XP for some AG [0,n — 1], since &(P) has order dividing 
the prime n. In particular, A is obtained as a root of X 2 + rX + s modulo n. 
Define the group homomorphism (the GLV reduction map) 

f: Z xZ4Z/ti 

(*)j) l— + (mod n) . 

Let /C = ker f . It is a sublattice of Z x Z of rank 2 since the quotient is finite. Let k > be a 
constant (depending on the curve) such that we can find i>i,t>2 two linearly independent vectors of 
K, satisfying max{|vi| , \ v2\~\ < k-y/n, where |-| denotes the rectangle norm 4 . Express 

(k, 0) = ftui + fov2 , 

4 The rectangle norm of {x,y) is by definition max(|a;|, \y\). As remarked in [9], we can replace it by any other metric 
norm. We will use the term "short" to denote smallness in the rectangle norm. 



Four-Dimensional GLV Method 3 



where /3j 6 Q. Then round to the nearest integer bi = [ft] = [ft + 1/2J and let t> = 61^1 + 62^2- 



Note that v € /C and that u = (fc, 0) — i> is short. Indeed by the triangle inequality we have that 



M < - < kvn . 



If we set (fei, fo) = u, then we get k = k\ + foA (mod n) or equivalently /cP = k\P + k2$(P), with 
max(|fci|, |fc2|) < k-^/re. 

In [9], the optimal value of k (with respect to large values of n, i.e. large fields, keeping X 2 +rX+s 
constant) is determined. Let A = r 2 — 4s be the discriminant of the characteristic polynomial of <P. 
Then the optimal k is given by the following result 5 . 

Theorem 1 ([9, Theorem 4]). Assuming n is the norm of an element ofL[<&\, then the optimal 
value ofk is 



^-(l + j-^j-), ifr is odd, 

Wl + T-TTi it r is even. 

2 V A 



3 The GLS Improvement 

In 2009, Galbraith, Lin and Scott [2] realised that we don't need to have <£ 2 + r<P + s = in End(E') 
but only in a subgroup of E(¥) for a specific finite field F. In particular, considering ^ = Frob p 
the p-Frobenius endomorphism of a curve E defined over F p , we know that ty m (P) = P for all 
P £ E(W p m). While this says nothing useful if m = 1,2, it does offer new nontrivial relations for 
higher degree extensions. The case m = 4 is particularly useful here. 

In this case if P € E(¥ p 4)\E(¥ p 2), then W 2 (P) = -P and hence on the sub group generated by 
P, satisfies the equation X 2 + 1 = 0. This implies that if &(P) is a multiple of P (which 
happens as soon as the order n of P is sufficiently large, say at least 2p), we can apply the 
previous GLV construction and split again a scalar multiplication as kP = k±P + k2&(P), with 
max(|fci|, l^l) = 0(^Jn). Contrast this with the characteristic polynomial of & which is X 2 — a p X+p 
for some integer a p , a non-constant polynomial to which we cannot apply as efficiently the GLV 
paradigm. 

For efficiency reasons however one does not work with E/¥ p 4 directly but with E' /¥ p 2 isomor- 
phic to E over F p 4 but not over ¥ p 2, that is, a quadratic twist over F p 2. In this case, it's possible 
that #E'(¥ p2 ) = n >(p-l) 2 be prime. Furthermore, if tp : E' — > E is an isomorphism defined over 
F p 4, then the endomorphism \P = ^Frob p ^ _1 € End(£") satisfies the equation X 2 + 1 = and if 
p = 5 (mod 8) it can be defined over ¥ p . 

This idea is at the heart of the GLS approach, but it only works for curves over ¥ pm with m > 1, 
therefore it does not generalise the original GLV method but rather complements it. 



5 There is a mistake in [9] in the derivation of k for odd values of r. This affects [9, Corollary 1] for curves E2 and 
E3, where the correct values of k are respectively 2/3 and 4\/2/7 . 
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4 Examples 

We give a few examples of GLV curves, which are curves defined over C with complex multiplication 
by an quadratic integer of small norm, corresponding to an endomorphism <p of small degree 6 . They 
make up an exhaustive list, up to isomorphism, in increasing order of endomorphism degree up to 
degree 3. While the first four examples appear in the previous literature, the next ones (degree 3) 
are new and have been computed with the Stark algorithm [10]. 

Example 1. Let p = 1 (mod 4) be a prime. Define an elliptic curve E\ over ¥ p by 



If (3 is an element of order 4, then the map 4> defined in the affine plane by 

4>{x,y) = {-x,/3y) , 

is an endomorphism of E\ defined over ¥ p with End(-Ei) = Z[4>] = — 1J, since <p satisfies the 
equation 

4> 2 + 1 = o . 

Example 2. Let p = 1 (mod 3) be a prime. Define an elliptic curve E2 over ¥ p by 

y 2 = x 3 + b . 

If 7 is an element of order 3, then we have an endomorphism 4> defined over ¥ p by 

<p(x,y) = (jx,y) , 
and End(£ , 2) = Z[<p] = Z[ 1+ Y/~^ ], since <f> satisfies the equation 

<i? + 4> + 1 = . 

Example 3. Let p > 3 be a prime such that -7 is a quadratic residue modulo p. Define an elliptic 
curve £3 over ¥ p by 



By small we mean really small, usually less than 5. In particular, for cryptographic applications, the degree is 
much smaller than the field size. 



y 2 = X 3 + ax . 






■], since <fi satisfies the equation 



4> 2 - (f) + 2 = . 
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Example 4- Let p > 3 be a prime such that -2 is a quadratic residue modulo p. Define an elliptic 
curve £4 over ¥ p by 

y 2 = 4x 3 - 30x - 28 
together with the F p -endomorphism <fi defined 7 by 

/ 2x 2 + Ax + 9 2x 2 + 8x - 1 

= 77 — -rr—,y- 



4(x + 2) ' y 4^(2; + 2) 2 
We have End (£4) = Z[</>] = Z[\/— 2] since satisfies the equation 



(f) 2 + 2 = . 

Example 5. Let p > 3 be a prime such that —11 is a quadratic residue mod p. We define the 
elliptic curve £5 over ¥ p 

2 , 13824 27648 

V = x x H 

539 539 

with a = (1 + v 7 — ll)/2 and the endomorphism <p defined by 



(x,y) = 



539 1 _539_\ 3 1 /28 _ 35 W 2 4- f-^n J.8W4- ±™n -A- 192 
5184" 1728/ x 127" 18/ x Z_\ 9 _ 3/ x 77 _ 77 
'2695 „ 539^2 1 f 217 „ , 49^, 64 4 



(2695 539 W2 217 49\ 

\ Ul84 u 864/ x ~V 54 " ~ 18/ x ^ 9 ** 3 

f 3773 Q - 18865 \ 3 ■ f_^69A a + _539_\ ^2 , ( J7_ a _ iL ) x + 20 , 1 \ 
V 373248" 995328/ x ^ V 20736 ^ 3456/ x ^1432" 144 / x ^ 27 ^ 9 I 
f 18865 , 116963 ^ 3 , / 7007 _ 539\ 2 1 (_791„ , 581 \ , 74„ _ 35 
I 1492992" "i" 995328/ x ^ V 20736 " 432/ x ^ I 432 u ^ 144/ x 27 u 9 / 

such that End(E 5 ) = Z[(f>] = The characteristic polynomial of (j) is 

2 _ ^ + 3 = . 

Example 6. Let p > 3 be a prime such that —3 is a quadratic residue mod p. We define the elliptic 
curve Eq over ¥ p 

o , 3375 6750 

v = ar x H 

y 121 121 

with the endomorphism <p defined by 



1331x 3 - 10890x 2 + 81675x - 189000 
33(llx -45) 2 ' 

1331x 3 - 16335a; 2 + 7425x + 43875 \ 

y — 



3V=3(llx - 45) 3 

such that 8 End(EQ) = Z[(f>] = Z[V^3]. The characteristic polynomial of (j) is 



b 2 + 3 = . 



7 We take the opportunity to correct a typo found and transmitted in many sources, where a y factor was absent in 
the second coordinate. Its sign is irrelevant. 

8 This is the first example where the endomorphism ring is not the maximal order of its field of fractions. It can be 
summarily seen as follows: End(_E) D Z[\/—3]. If not equal, then it must be the full ring of integers Z[±±^]. This 
would imply that j — 0, as there is only h(— 3) = 1 isomorphism class of elliptic curves with complex multiplication 
by Z[ 1+ ^~^ ], given in Example 2 (see [10] for an abridged description of the theory of complex multiplication). 
This is clearly not the case here. Alternatively, one can see that there would exist a nontrivial automorphism (a 
primitive cube root of unity) corresponding to ■ A direct computation then shows this is impossible. 
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5 Combining GLV and GLS 

Let E/Fp be a GLV curve. As in Section 3, we will denote by E' /F„2 a quadratic twist F„4- 
isomorphic to E via the isomorphism ip: E' — > E. We also suppose that #E'(F p 2) = nh where n is 
prime and h < 4. We then have the two endomorphisms of E', & = ^Frob p V _1 an d ^ = tp^tp -1 , 
with (j) the GLV endomorphism coming with the definition of a GLV curve. They are both defined 
over F„2, since if a is the nontrivial Galois automorphism of F„4/F„2, then ifj a = —tp, so that 
i[r a = ^ Frobp (V'" 1 )' 7 = (-V0 Frob p (-^ _1 ) = meaning that & £ Endw p2 (E'). Similarly for <P, 
where we are using the fact that 4> £ Endp p (E) . Notice that & 2 + 1 = and that <P has the same 
characteristic polynomial as <f>. Furthermore, since we have a large subgroup (P) C £"(F p 2) of prime 
order, 'P(P) = XP and 'F(P) = fiP for some A, [i £ [l,n— 1]. We will assume that ^ and 'F, when 
viewed as algebraic integers, generate disjoint quadratic extensions of Q. In particular, we are not 
dealing with Example 1, but this can be treated separately with a quartic twist, as was hinted 
m [2]. 

Consider the biquadratic (Galois of degree 4, with Galois group Z/2 x Z/2) number field 
K = Q($,\P). Let Ok be its ring of integers. The following analysis is inspired by Sica, Ciet 
and Quisquater [9, Section 8]. 

We have Z[<P, •P'] C ok- Since the degrees of <P and 'F are much smaller when compared to n, the 
prime n is unramified in K and the existence of A and fx above means that n splits in Q(#) and 
Q(^), namely that n splits completely in K. There exists therefore a prime ideal n of Ok dividing 
nox, such that its norm is n. We can also suppose that <P = A (mod n) and \P = n (mod n). The 
four-dimensional GLV (4-GLV) method works as follows. 

Consider the 4-GLV reduction map F defined by 

F: Z 4 ^Z/n 
(xi, x 2 , x 3 , x 4 ) 1— > xi + x 2 X + x 3 /i + x^Xfj, (mod n) . 

If we can find four linearly independent vectors v\, . . . , t> 4 G kerF, with maxj \ vi I < Cn 1 / 4 for 
some constant C > 0, then for any k £ [1, n — 1] we write 

4 

(fc, 0,0,0) =J2fr v i > 
j=i 

with (3j £ Q. As in the GLV method one sets v = ^2j=i\_Pj~\vj an d 

u = (/c, 0, 0, 0) — v = (ki, k 2 , k 3 , fc 4 ) . 

We then get 

fcP = kiP + k 2 F(P) + fc 3 ^(P) + k4&$(P) with maxd^l) < 2Cn 1 / 4 . (1) 

i 

We focus next on the study of ker F in order to find a reduced basis v± , v 2 , v 3 , i> 4 with an explicit 
C. We can factor the 4-GLV map F as 



Z 4 -L^> Z[<2>, 9] " ducti °" n ) Z/n 

L ' 1 mod n n Z[$, >f] ' 
(Xl, X 2 ,X 3 , X 4 ) I > Xl + X 2 <£ + X 3 ^ + X 4 ^ I > Xl + X 2 A + X 3/ Lt + X 4 A;U 

(mod n) . 
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Notice that the kernel of the second map (reduction mod n Pi Z[<£, is exactly n D Z[#, \&\. 
This can be seen as follows. The reduction map factors as 

Z[$,&] — > ok — > ox/n = Z/n 

where the first arrow is inclusion, the second is reduction mod n, corresponding to reducing the Xj's 
mod n n Z = nZ and using <P = A, \P = fi (mod n). But the kernel of this map consists precisely of 
elements of Z[<£, \P] which are in n, and that is what we want. 

Moreover, since the reduction map is surjective, we obtain an isomorphism Z[#, ^]/nnZ[#, = 
Z/n which says that the index of n n Z[<P, \P] inside Z[<P, <?] is n. Since the first map / is an 
isomorphism, we get that kerF = / _1 (nnZ[$,li']) and that kerF has index [Z 4 : kerF] = n inside 
Z 4 . 

We can also produce a basis of ker F by the following observation. Let <P' = <P — A, \P' = \P 
hence = <f>ty — X^F — ii<f> + A/x. In matrix form, 



fi. 



( 1 \ 



1 1 \ 



(l 0\ 

-A 1 
-H 1 
\\n — n —A 1 j 

Since the determinant of the square matrix is 1, we deduce that Z[<£, F] = Z{<P',F']. But in this 
new basis, we claim that 

n n Z[#', F'} = nZ + Z<2>' + Ztf ' + ZF'F' . 

Indeed, reverse inclusion (D) is easy since ,F' j&F' £ n and so is n, because n divides nox 
is equivalent to n D no.fr . On the other hand, the index of both sides in Z[<?',^'] is n, which can 
only happen, once an inclusion is proved, if the two sides are equal. Using the isomorphism /, we 
see that a basis of ker F C Z 4 is therefore given by 

wi = (n, 0, 0, 0), w 2 = (-A, 1,0, 0), w 3 = (-//, 0, 1, 0), w A = (A//, -//, -A, 1) . 

The LLL algorithm [5] then finds, for a given basis w\, . . . , W4 of ker F, a reduced 9 basis vi,...,V4 
in polynomial time (in the logarithm of the norm of the wVs) such that (cf. [1, Theorem 2.6.2 p. 85]) 

4 

Y[\vi\ < 8[Z 4 : kerF] = 8n . (2) 



i=i 



Lemma 1. Xei 



N: Z 4 ^ 
(a;i,X2,x 3 ,ar 4 ) h+ 



E 



tj . „«1 ™»2 „«3 „14 

- / ll,«2,«3,«4 J/ l x 2 X 3 , * / 4 



U,*2,«3i*4>0 
U +12+13+24=4 

6e i/ie norm 0/ an element x\ + x 2 F + a^!^ + x^FF £ Z[F, F], where the b^^^^ 's lie in Z. Then, 
for any nonzero v £ ker F, one has 

n l/4 



M > 



( 1^1 ,12 ,13,14 I ) 



1/4 



(3) 



U,«2,13,«4 
il+*2+«3+*4=4 



The estimates are usually given for the Euclidean norm of the vectors. But it is easy to see that the rectangle norm 
is upper bounded by the Euclidean norm. 
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Proof. For v G kerF 1 we have DSf(u) = (mod n) and if v 7^ we must therefore have |N(i>)| > n. 
On the other hand, if we did not have (3), then every component of v would be strictly less than 
the right-hand side and plugging this upper bound in the definition of \N(v)\ would yield a quantity 
< n, a contradiction. □ 

Let B be the denominator of the right-hand side of (3), then (2) and (3) imply that 

\vi\ < 8B 3 n 1 / 4 i = 1,2,3,4 . (4) 

Remark 1. In our case, where !F 2 + 1 = and <P 2 + r<L> + s = 0, we get as norm function 

x\ + s 2 x\ + x\ + s 2 x\ — 2rx\x2 — 2rsxix\ — 2rx\x^ — 2rsx^,x\+ 

(r 2 + 2s)x\xl + 2x\x\ + (r 2 - 2s)x\x\ + (r 2 - 2s)x\x\ + 2s 2 x\x\ + (r 2 + 2s)x\x\ 

— 2rx 2 xsX4 — 2rsx\x-$XA, — 2rx\X2x\ — 2rsx\x 2 x\ + 8SX1X2X3X4 , 

and therefore 

B = (4 + 4s 2 + 8s + 8|r| + 8\r\s + 2(r 2 + 2s) + 2|r 2 - 2s\) 1/4 . (5) 
From (1) and (4) we have proved the following theorem. 

Theorem 2. Let E/¥ p be a GLV curve and E'/¥ p 2 a twist, together with the two efficient endomor- 
phisms & and \P , where everything is defined as at the start of Section 5. Suppose that the minimal 
polynomial of <P is X 2 + rX + s = 0. Let P G E'(¥ p 2) a generator of the large subgroup of prime 
order n. There exists an efficient algorithm, which for any k G finds integers ^1,^2,^3,^4 

such that 

kP = k x P + k 2 ${P) + k^{P) + k A V<P{P) with maxflfcil) < 16B 3 n 1/A 

i 

and 

B = (4 + 4s 2 + 8s + 8|r| + 8\r\s + 2(r 2 + 2s) + 2|r 2 - 2s|) 1/4 . 



6 A Tale of Two Cornacchia Algorithms 

In view of the fact that the LLL algorithm is rather inefficient compared to other dedicated algo- 
rithms in dimension less than five (running in 0(log 3 n)), we can ask ourselves if we can sharpen 
the bound of Theorem 2 and provide an explicit description of a simpler algorithm to find a short 
basis of kerF. This is the scope of the the present section. Our algorithm has a running time of 
0(log 2 n), and will produce a proved bound greatly improving the WB 3 of Theorem 2. 

The idea is to modify the original GLV approach which finds a short basis using an extended 
Euclidean algorithm. We find that in this case we need to perform two such algorithms, one in Z, 
like in the GLV original paper, the other one in Z[i], the Gaussian integers. The main difficulty 
here lies in the correct choice of the remainders in the Gaussian gcd algorithm, since we don't have 
a canonical way to choose a "positive" one. 

In contrast to Section 5, where we worked with generic endomorphisms ^, iP generating a bi- 
quadratic field, we will strongly use here the fact that + 1 = 0. We will denote indifferently by 
the letter i the usual imaginary root of unity in C, the integer mod n such that &(P) = iP, as well 
as the endomorphism In particular, we let, for z = a + ib G Z[«], zP = aP + ibP = aP + b\P(P). 
The context in which we are referring to one or the other of these interpretations will be clear each 
time. These differences notwithstanding, we suppose that we are set as in the first paragraph of 
Section 5. 
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6.1 The Euclidean Algorithm in Z 

The first step is to find v = a + ib G Z[z] such that |z/| 2 = a 2 + b 2 = n, i.e. a Gaussian prime above 
n. Recall that n splits in Z[i]. Let 1/ = a + ib a prime above n. We can furthermore assume that 
vP = aP + biP = aP + M r {P) = 0, since vvP = nP = and hence either vP is a nonzero multiple 
of P and therefore vP = 0, or else we vP = 0, so that in any case one of the Gaussian primes 
(WLOG v) above n will have vP = 0. We can find v by Cornacchia's algorithm [1, Section 1.5.2], 
which is a truncated form of the GLV algorithm. For completeness and consistency with what will 
follow, we recall how this is done. 

Let \x G [1, n] such that \i = i (mod n), with i being defined by &(P) = iP. Actually, in the GLS 
approach [2], it has been pointed out that this value of fi can be readily computed from #E(F P ). 
The extended Euclidean algorithm to compute the gcd of n and jj, produces three terminating 
sequences of integers (rj)j>o, (sj)j>o and (tj)j>o such that 

( r j+2 s j+2 t j+2 \ f-Qj+i 1\ ( r j+ i s j+1 t j+1 \ 

\r S+ i s j+ i t j+1 J \ 1 Oj { rj 8j tj ) ' 3 ~ W 
for some integer q,j + \ > and initial data 

/n si tA = (\i A ( . 

\r sot ) \nl0j ■ { > 

This means that at step j > 0, 

rj = q j+ ir j+1 + r j+2 

and similarly for the other sequences. The sequence (qj)j>i is uniquely defined by imposing that 
the previous equation be the integer division of rj by r,j + \. In other terms, qj + \ = L r j/ r j+iJ- 
This implies by induction that all the sequences are well defined in the integers, together with the 
following properties. 

Lemma 2. The sequences (rj)j>o, (sj)j>o and (tj)j>o defined by (6) and (7) with qj + \ = L r j/ r j+iJ 
satisfy the following properties, valid for all j > 0. 

1. rj > rj + \ > and qj + \ > 1, 

2. (—iysj > and \sj\ < \sj + i\ (this last inequality valid for j >1), 

3. (-l) j+1 tj > and \tj\ < \t j+1 \, 
4- Sj+irj - Sj r j+1 = (-iy +1 ri , 

5. t j+irj -t jrj+1 = (-l) J r , 

6. r Sj + r\tj = rj. 

These properties lie at the heart of the original GLV algorithm. They imply in particular 
via 1. that the algorithm terminates (once rj reaches zero), and that it has O(logn) steps, as 
rj = qj + \rj + \ + r J+ 2 > rj + \ + rj + 2 > 2rj + 2- Note that 1., 2. &: 3. imply that 4. &: 5. can be 
rewritten in our case respectively as 

l s j+i r jl + l s i r j+i| = M an( i l^'+i r jl + l^' r i+il = n . (8) 

The Cornacchia (as well as the GLV) algorithm doesn't make use of the full sequences (rj), (sj) 
and (tj) but rather stops at the m > such that r m > y/n and r m +i < y/n. An application of (8) 
with j = m yields \t m+ ir m \ < n or |t m +i| < \/n. Since by 6. we have r m+ i — fit m+ i = ns m+ \ = 
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(mod n) we deduce that r^ +1 + t 2 m+1 = (r m +i — nt rn+ i){r m+ \ + fit m+ i) = (mod n). Moreover 
tm+i / by 3. so that < r 2 rl+1 + t 2 n+1 <n + n = 2n which therefore implies that r 2 n+1 + t 2 n+1 = n 
and finally that v = r m+ \ — it m+ ±. 

We present here the pseudo-code of this Euclidean algorithm in Z. 

Algorithm 1 (Cornacchia's GCD in Z) 

Input : n = 1 (mod 4) prime, 1 < fi < n such that /i 2 = —1 (mod n) . 
Output: v = z/(#) + Gaussian prime dividing n, such that vP = 0. 



1 . initialize: 

r <r- n, n<- n, r 2 <r- n, 
t ^0, t x <- 1, t 2 ^0, 

2 . main loop: 

while r\ > n do 

q <- [ro/nl , 

r 2 <- r - qn , r <r- n , n <r- r 2 , 
t2 ^~ to — qti , to 4— t± , t\ 4- t2 • 

3 . return: 

u = r 1 -it 1 , v( R ) = n , = -ti 



6.2 The Euclidean Algorithm in Z[i] 

In the previous subsection we have given a meaning to zP, where z G Z[i], and we have seen how 
to construct u, a Gaussian prime such that vP = 0. By identifying 10 {x\, x 2 , X3, X4) G Z 4 with 
{z\,Z2) = (xi + ix3,x 2 + ia;4) G Z[i] 2 , we can rewrite the 4-GLV reduction map F of Section 5 as 
(using the same letter F by abuse of notation) 

F: Z[i] 2 -»• Z[i]/i/ ^ Z/n 
(zi,z 2 ) >-> Zi + \z2 (mod 1/) . 

This F should be confronted to the map f of Section 2. In mimicking the GLV original paper [3] we 
are drawn to applying the extended Euclidean algorithm (defined exactly as before, with integer 
divisions occurring in Z[i], henceforth denoted EGEA in short for extended Gaussian Euclidean 
algorithm) to the pair (7*0, = (A, v) if A > \pi\v\ and (ro,ri) = (A + n, v) otherwise (the latter 
case being exceptionally rare). We should note that 4., 5. & 6. of Lemma 2 still hold and 1. holds 
in modulus (in particular the algorithm terminates). However, in the analysis of this algorithm, 
especially in [9], a crucial role is played by (8), realising a bound on and I Sj'^j+i I ou t °f a 

bound on 

Sj+irj - Sjr j+1 = (-iy +1 u (9) 



It is important to keep in mind that this association is only an isomorphism of abelian groups (Z-modules). 
However, Z[i] 2 is also endowed with a structure of Z[i]-module. 
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in the present case. This fact, as we saw, stems from the alternating sign of the sequence (sj), 
which results from taking a canonical form of integer division with positive quotients qj + \ and 
nonnegative remainders rj +2 , a property which is not available here. Nevertheless, we can still use 
a similar reasoning using (9), provided that the arguments of Sj+i r j anc ^ s j r j+i are n °t too close, 
so as to avoid a high degree of cancellation. 

The first observation is that in the case of Gaussian integers there can be 2, 3 or 4 possible 
choices for a remainder in the j-th step of the integer division rj = qj + irj + \ + Tj +2 . It turns out 
that choosing at each step j > of the EGEA a remainder rj +2 with smallest modulus will yield 
the following decomposition theorem. 

Theorem 3. In the notations of Theorem 2, the lattice reduction consisting of Cornacchia's algo- 
rithm in Z with positive remainders (Algorithm 1 ) and in Z[i] with smallest remainders (Algorithm 2 
and 3) runs in 0(log 2 n) binary operations and will result in a decomposition of any k G [l,n] into 
integers k±, k 2l k^, k^ such that 



We give here the pseudo-code of Cornacchia's Algorithm in Z[i] in two forms, working with 
complex numbers and separating real and imaginary parts. 

Algorithm 2 (Cornacchia's algorithm in - compact form) 

Input: v Gaussian prime dividing n rational prime, 1 < \ <n such that A 2 +rA + s = 
(mod n) . 

Output: Two Z[i]-linearly independent vectors v\ k vi of ker F C Z[i] 2 of rectangle 
norms < 51.5( v / l + \r\ + s) n 1 / 4 . 



1 . initialize: 

If A 2 > 2n then 

ro <- A, 
else 

r ^— A + n , 
r\ <— v , r2 <— n, 
s <- 1, si ^— 0, s 2 ^— 0, 
q <- 0. 

2 . main loop: 

while |r2| 4 (l + \r\ + s) 2 > n do 

q ^— closest Gaussian integer to r$/ri, 
r2 <r- r - qn , r <- n , n <r- r 2 , 
S2^s -qsi, so^-si, si^s 2 - 

3 . return: 

vi = (ro,-s ), v 2 = (r 1 ,-s 1 ) 



kP = k x P + k 2 $(P) + k^(P) + k&${P) 



with 




1/4 
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Algorithm 3 (Cornacchia's algorithm in Z[i] - real & imaginary parts) 



Input: v Gaussian prime dividing n rational prime, 1 < \<n such that A 2 + rA + s = 

(mod n) . 

Output: Four Z-linearly independent vectors vi , v 2 , v% and £ ker F C Z 4 of 
rectangle norms < 51.5(i/l + \r \ + s) n 1//4 . 



initialize: 

If A 2 > 2n then 

r 0,(R) <~ X > 
else 

r 0,(R) 

r o,(/) 
ri,( R) 

r UR) 

5 0, (R) 

51, (R) 
s 2,(7?) 
9(71) 



«- A + n, 
0, 

-"(A). r UI) 

- n, r 2j(/) 

- !. s 0,(7) <~ 

- 0, si,(/) <- 

- 0, s 2 ,(7) <- 
0, g (J) <- 0. 

main loop: 



0, 
0, 

o, 
o, 



9(H) 



9(7) 



while (r 4 + 2r 2 i(R) r 2 (J) + r 4 + |r| + s) 2 > n do 

r o,(fl) r i,(fl) + r 0,(/) r l,(J) 
r i,(7?) + r ?,(7) 

^0,(7)^l,(fi) ~ r 0,(R) r l,(I) 

r i,(7?) + r i,(7) 
r 0,(7?) - (9(7?) r l,(7?) ~ Q(I) r l,(I)) > 
^0,(7) -(9(7?)n,(7)+ 9(7)^1,(7?)) = 

^i,(7?) , r 1>(fl) <- r 2)(fl) , 
ri,(i), r UI) ^r 2AI) , 



r UR) 
r Ui) 
r o,(R) 
r o,(i) 

S 2,(7?) 
82,(1) 
S 0,(R) 
80,(1) 

return: 

vi = (r ,(R) 



1 '^uj' 

s 0,(7?) - (Q(R) s 1,(R) - 9(7) s l,(7)) . 
«0,(7) - (9(7?)Sl,(7) + 9(7)Sl,(7?)) » 

' s l,(7?) <~ S 2 ,(R) , 
8l,(I), Sl,(7) <~ S2,(J)- 



s 0,(7?), r '0,(7),- s 0,(7)). «2 = ('"l,(ii),-Sl,(fl),'"i ) (/),-Si ) (/)) > 
= (-^0,(7),S0,(7)^0,(7?),-S0,(7?)). ^4 = (-^1,(7) . «1,(J) . ^1,(7?) , ~ S 1,(R) ) ■ 



7 Proof of Theorem 3 

This section is devoted to proving that Algorithms 2 and 3 produce a reduced basis of ker F of 
rectangle norm < 51.5(-y/l + \r\ + s) n 1 / 4 . The proof of the decomposition of A; follows from the 
deduction recalled in Section 5. 
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Let us note first, about the running time, that it is known that the extended Euclidean al- 
gorithm runs in 0(log 2 n) bits. The same analysis will also show that its Gaussian version runs 
in 0(log 2 n) bits, since its number of steps is also logarithmic. In short, this works as follows: if 
bj = Ll°g2(l r jl)J (i- e - the bitsize of \rj\), then step j of the EGEA necessitates to find qj+i and 
then Vj + 2- One can show that integer division of two /i-bit Gaussian integers with a i-hii quotient 
runs in 0(h(£+ 1)) binary operations. Finding qj + i has therefore a runtime 0(bj(cj + \ + 1)), where 
Cj+i = |log2(|<?j+i|)J = bj — bj + \ + 0(1). Similarly, knowing qj+i, computing rj + 2 can be done in 
0{bj + \Cj + i) + 0(bj + i) = 0(bj + i(bj — bj + \)) + 0(bj + \). If S = O(logn) is the number of steps of 
the EGEA, the total runtime is less than a constant times 

s 

bjib, - b j+1 ) + bj = O{b 2 + b S) = 0(log 2 n) . 

3=0 

In the following, whenever z G C*, its argument value arg(z) will be always chosen in (— 7r, it]. 
By lattice square we mean a square of side length one with vertices in Z[i]. We single out eight 
exceptional lattice squares, which are those lattice squares with a vertex of modulus 1 (that is ±1 
or ±i) but not containing the origin as a vertex. Our analysis of the EGEA rests on the following 
lemmas. 

Lemma 3 (A geometric property of squares). There exists an absolute real constant 6 w 
2.45861 (with 2arctan2 < 8) such that, for any point P of a lattice square, different from the 
vertices, letting V\ be the closest vertex to P, there exists another vertex Vi / V\ with 6 < V1PV2 < 
7T. (Note that V X P < l/y/2.) 




Fig. 1. 



Proof. This is one case where a picture is worth one thousand words. We refer to Figure 1 for a 
visual explanation of why the argument works. The dotted and dashed circle arcs are centred on the 
vertices and have radius l/\/2. The plain circle arcs have the following property: for any point P 
on them, the two square vertices V and V belonging to them make an angle of 9 with P, in other 
terms |VPV'| = 0. Therefore points between two bigger arcs (in one of the two almond-shaped 
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regions) "look" at the diagonally opposite vertices marking the intersection of these arcs with an 
angle between 9 and it. We then choose the closest vertex to get a distance < l/\/2. In case P is at 
the intersection of the two almond-shaped regions (in the "blown square"), we may have to choose 
one region where one of the vertices is at distance < l/y/2, but this is always possible, since the 
dashed and dotted disks cover everything. Finally, if P does not belong to the union of the two 
almond-shaped regions, then it lies inside one of the smaller plain disks, where its angle between 
two appropriate consecutive vertices will also be between 9 and it. Furthermore, by choosing the 
closest vertex V\ to P, we have V\P < l/y/2. □ 

It remains to explain how we can calculate 9, or rather its value on the usual trigonometric 
functions sin# and cos 9 (which is what we really need later), since we can show that they are 
algebraic numbers expressible by radicals, but 9/tt ^ Q. 

We concentrate on finding the cartesian coordinates of i? = (1/2, 1 — u/2), appearing in Figure 1, 
supposing the vertices are the origin, (1,0), (1, 1) and (0, 1). Our aim is then to find u € (0, 1). A 
look at Figure 2 shows the disposition of the angles, so that u = cot(0/2) and 2— u = cot(3#/2— it) = 
cot (39/ 2). The triplication formulas for the cotangent then show that u satisfies the equation 

u + 3U ~ U = 2 2u 3 - 3u 2 - 2u + 1 = . 

1 — 3ir 

Solving it yields that the root we are looking for is 




0.3554157 



where the determination of the cube root is the one in the first quadrant. 




Fig. 2. 



Remark 2. One can see that 9/tt ^ Q in the following way. 
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arg 



and supposing by absurd that 6 /-it € Q we would have that e ld is a root of unity. The preceding 
equality shows that then cot(0/2) belongs to a cyclotomic extension of Q, whose Galois group 
is abelian. But we have seen that the irreducible polynomial of cot (9/2) is 2x 3 — 3x 2 — 2x + 1, 
with discriminant 316, not a rational square. Therefore its Galois group is the nonabelian S3, 
contradiction. 

Remark 3. When applying Lemma 3, it is essential that we be able to choose from the set of all 
vertices of the lattice square which ones are the adequate V\ and V2. Since the only excluded 
quotient qj is zero, it means that we must be careful to avoid all four squares which have the origin 
as a vertex. But this follows from the fact that at all steps j > we always have \rj/rj + i\ > \[2. 

Define = arctan2 - vr/3 and A = 1/sin© = 2^/5 (8 + 5>/3)/\/l3 + 4>/3 « 16.6902. In the 
following analysis of the EGEA, it will be useful to make the following distinction between indices. 

Definition 1 (Good and bad j's). A step j > of the EGEA will be called bad if, during the 
j — 1-th step, among all four choices of qj as a vertex of the lattice square containing rj-i/rj (and 
consequent choice ofrj + i and Sj+i, noting that for the purpose of this definition we do not require 
that |rj_|_i| < \rj\), we always have SjSj+±rj + i 7^ and 

^) <e . 

Otherwise j is called good. 

Remark 4- Note that j = and j = 1 are always good, since si = 0. 

Lemma 4 (Use of good j's). If j is good then for some choice of r'j +1 (and relative s'j +1 ) we 
have 

\s' j+1 rj - Sjr' j+l \ > sin <9max( | s' j+1 rj |, \sjr' j+l \) 

and, therefore, if we choose a r J+ i with smallest modulus, then 

max(|ajTj+i|, \8 j+1 rj\) <{A + l)\v\ 

Proof. Notice that the result holds trivially if Sjs'- +l r'- +1 = 0. Otherwise, this is a straightforward 
application of a general inequality about complex numbers that we can express as follows: let 
( £ C* with 7r > |arg(C)| > O. We claim that under these conditions, |1 — CI ^ sin6>. Indeed, 
writing Q = re 1 ^ with O < ijj < ir we have 

|1-C| 2 = (l-re^) (l-re-^) = l-2rcosV + r 2 . 

First note that we can suppose ip < vr/2, otherwise clearly |1 — C| ^ 1- The last expression in r, when 
viewed as a quadratic polynomial has minimum (over R) equal to —A/4 = — (4 cos 2 ip — 4)/4 = 
sin 2 ^ > sin 2 &. Therefore |1 — CI ^ sin© thereby proving our claim. The first part of the lemma 
will follow by applying the claim to C = s 'j+i r j/ s j r 'j+i an d C = s j r 'j+i/ s 'j+i r j successively. 
The second part follows from 

\sjr j+1 \ < \sjr' j+1 \ < A \s' j+1 r,j - s jr ' j+1 \ = A\v\ 

and therefore 

\s j+1 rA = \s j+1 rj - Sj r j+1 + Sjr j+1 \ < \s j+1 rj - Sjr j+1 \ + \sjr j+1 \ < \u\ + A\v\ 

□ 
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Remark 5. We have seen in the course of the proof the preceding lemma the following fact: if 
C G C* with 7r > | arg(C)| > Y>, then |1 — CI > sin^. This is equivalent to the following assertion 
(set C = 1 — £), used in the proof of the next lemma: if |£| < sin?/', then | arg(l — £)| < 4 } - 

The next result is crucial in controlling what happens when things go "uncontrolled". Its proof 
is rather elaborate. 

Lemma 5 (Bad-j behaviour of Sj). If j is bad, then 



< 2\/2| 



'•j-H 



and 



N < \Sj-l\ 



Proof. We first suppose that the point P of affix rj-i/rj does not belong to an exceptional lattice 
square. Let V± and V2 as in Lemma 3 of affixes respectively qj and q'-. Upon defining t'- +1 = 



-l-Qfj, since r j+1 



qjrj, Lemma 3 states that ir > |arg((<7j — rj-\/rj)/{q'^ — rj-i/rj)) | 



arg(rj + i/r^ +1 )| > 9. By definition of "bad" we have, denoting s'j +1 = Sj-i — q'jSj, 

< and arg [ tl+fl ] < 



arg 






\ S j r j+l) 



and this yields 



arg 



< 



We deduce 



arg 



g i+i r i+i 



+ arg 



arg 



arg 



rj+i 



5 i+ 



S 3 r 3+lJ 

S 3+l r 3 
Sjr j+1 



+ arg 



+ 



arg 



S J r 3+l 
S 'j+l r 3 . 



while on the other hand 



arg 



which together imply 



+ 



arg 



arg 



> 



> 



rj+i 

<•;+.. 



Sj+l 



arg 



2±1 1 



< 20 + vr < 6> - 



arg 



< 20 



S 3 + l r j+l 



°3 + 



2vr 



47T 

3- + 7r< T 



> 



2vr 



(10) 



Now assume that \sj\ > \sj-i\. Then \qjSj\ > y/2\sj-±\ and \q'jSj\ > v/2|sj_i|, since the quotients 
Gaussian integers of modulus different from zero or one. Furthermore, since there is at most 
one Gaussian integer of modulus one in a lattice square, we have that either \qjSj\ > 2|sj_i| or 
\q'jSj\ > 2|sj_i|. Therefore, by Remark 5, 



arg 





arg 


(1-"-') 


< 






V <ti s 3 J 





7T 

4 
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and similarly 











arg 


K 4«i J 



< 



TT 



4 ' 



with at least one of them being < tt/6. We then get, using that | arg(<7j/g')| < 7r/4, 













arg 








arg ^ 













1j s j s j-i 



7T 7T 7T 2lT 
< 1 h- = 

~ 4 4 6 3 



+ arg 



+ arg 



contradicting (10). 

Exceptional Choices of V\ and V2: 

We now discuss the exceptional cases when qj or q'- 



±1, ±i, which need to be handled od /ioc. 



By symmetry (without loss of generality), we place ourselves in the case when rj-i/rj lies in the 
lattice square of vertices i,l+i,l + 2i,2i. It then belongs to one of the five regions labeled 1 to 5 on 
Figure 3. Note that the open grey region is off-limits, since |rj_i/r»| > \f2. Each of these regions 
contains two lattice points, which as before we will denote V% for the one closest to the point P of 
affix Tj—x/rj and V2 for the other one (in case P lies on the boundary between two or more zones, 
their distinction is immaterial). Note that Vx is closest to P among all four vertices. 




1 + i 



Fig. 3. 



Region 1 (delimited by a triangle of vertices z, 1/4 + 3i/2,2i): In this case, | arg(r 3 -+i/ri- +1 )| = 

Vx PV2 > 2arctan2. Supposing to fix notations that qj = i and q'j = 2i we have, assuming that 
\sj\ > \sj-x\ and using Remark 5 



18 P. Birkner, P. Longa, F. Sica 



On the other hand, a reasoning similar to the one leading to (10) with the value 2arctan2 instead 
of 9 will show that again | arg(sj + i/s'- +1 )| > 27r/3, which leads to a contradiction since 



arg 



sj+i 
s 'j+i 



arg 



5 i-i 



<lj s j 



+ arg 




7T 1 2l 



The other four cases are treated similarly and we briefly outline them. 

Region 2 (delimited by a triangle of vertices 2i, l + 7i/4, l + 2i): Here | arg(r J+ i/r^ +1 )| > 2arctan2. 
Letting qj = 2i, q'j = 1 + 2i one can show, assuming that \sj\ > \sj-±\, that 



2vr 



arg 



arg 



7T /7T 

< — + ^— — arctan 2 




+ arg 



Sj-l 



, 7T 27T 

arctan 2 ) < — < — 



contradiction. 

Region 3 (delimited by a triangle of vertices 1 + 2i, 3/4 + i/2, 1 + i): Here | arg(r J+ i/r^ +1 )| > 
2 arctan 2. Letting qj = 1 + i, «?'■ = 1 + 2i one can show, assuming that \sj\ > that 



2vr 



arg 



s 3+l 



arg 



+ arg -f 



+ arg 



< - + arctan(l/3) + (^- 



arctan 2 



ir 2ir 
2 < Y ' 



contradiction. 

Region 4 (the red zone): Here | arg(rj + i/r^ +1 )| > 7r — arctan 2 + arctan(2/3). Letting qj = i,q'- 
1 + 2i one can show, assuming that > that 



5ir 



3arctan(2) + arctan(2/3) < 



arg 



arg 



Sj+l 



+ arg ^ 



arg 



Qj s j / , 
~ 2~ ( 2~ ~~ arctan ^) + ~~ arctan 2^ 
< — — 3arctan(2) + arctan(2/3) , 



contradiction. 
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Region 5 (the yellow zone): Here | axg(rj+i/r'j +1 )\ > tt — arctan(4/7). Letting qj = 1 + i, q'j 
one can show, assuming that > that 



2i 



5n 



3arctan(2) + arctan(2/3) < 



arg 



4+1 



arg 



+ arg -7 + arg 



TT TT TT 2tT 
< 1 h- = 

~ 4 4 6 3 

< 3arctan(2) + arctan(2/3) , 



j a 3 



1j s i ~ s i-i 



contradiction. 

We have thus proved that in any case \sj\ < \sj-i\. To show the first part of the lemma, we 
proceed similarly, although there is a slight difference. We assume at first that > 2 |sj-i| and 

\s'j +1 \ > y/2\sj-i\. Then, by Remark 5, 



arg 



Sj+l 



and 

Proceeding as previously, 

Sj + l \ 



arg 



arg 



l - s o-i 



1- *->=*.) 

Sj + l ) 



~ 6 



< 



TT 



arg 



arg 



s j+i 



+ arg f % I + arg 



g j+l Sj- 1 



7T 7T 7T 27T 
< 1 h- = 

- 4 4 6 3 



again contradicting (10), which also holds in the exceptional cases, as we have just seen. Therefore 
\ s j+i\ < 2|sj_i| or \s'j + i\ < \/2|sj_i| (or both). In the first case, we are done. Otherwise, since 
+ qjSj = s'j +1 + q'jSj = Sj-i, we derive 



s i 



< 1 4+1 1 + 



2j\\Sj\ < V2\sj-i 



+ V2\. 



>j-i\ 



2\/2|s 7 _i| 



by the already proved second part of the lemma and the fact that qj , q'j correspond to two vertices 
of the same lattice square, so that \qj — q'j\ < y/2. □ 

Lemma 6 (Lower bound on generic vectors of kerF). For any nonzero {z\,z?) G kerF we 
have 

max(|zi|, | ^2 1 ) > 



y/l + \r\+. 



In particular, for any j > we have 

ma,x(\rj\,\ S j\) > 



y/1 + \r\ + s 
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Proof. This proof uses an argument already appearing in the proof of the original GLV algorithm, 
see [9], as well as Lemma 1. If (0, 0) / {zi,z 2 ) £ ker F then z\ + \z 2 = (mod v). If A' is the other 
root of X 2 + rX + s (mod n), we get that 

z\ - rz\z 2 + sz\ = (zi + Xz 2 )(zi + X'z 2 ) = (mod v) . 

Since X 2 + rX + s is irreducible in Q(i) because the two quadratic fields are linearly disjoint, we 
therefore have \z\ — rz\z 2 + sz^\ > M- On the other hand if 

max(|zi|, \z 2 \) < 



V 7 ! + H + s 
then 

\z\ - rz\z 2 + sz\\ < \zi\ 2 + |r||zi||z 2 | + s\z 2 \ 2 < \u\ , 

a contradiction. To show the second part, it suffices to note that since r$Sj + utj = rj (where, as 
mentioned previously, vq = A or A + n), we have that 

= vtj = r,j — roSj = r,j — Xsj (mod v) 

so that (rj, —Sj) £ ker F for every j > 0. □ 

Proof (of Theorem 3). It remains here to show the improved bound, which brings us to finding four 
Qdinearly independent vectors of ker F of rectangle norm bounded by Cn 1//4 . Define m > 1 as the 
index such that 



\r m \ > v V = and |r m+ i| < - = = . (11) 
V 7 ! + \r\ + s V 1 + \ r \ + s 

Let us consider an index j < m. If it's good, then by Lemma 4 we have |sj+irj| < (^4 + and 
therefore, since (\rj\) is a decreasing sequence, 

\s j+1 \ < 2y/2(A + l)Vl + kl + s-v/H • (12) 



On the other hand if it's bad, then let I < j be the largest good index less than j. By Lemma 5 
and Lemma 4 we have 

< \sj-i\ < \sj- 2 \ < ■ ■ ■ < \si\ < (A + V J^- 



2V2 \n +1 \ 

<{A + l)y/l + \r\ + s\AH , 



therefore in any case (12) holds. Applying this to j = m — 1 and j = m we find that 



max(|s m |, |s m+ i|) < 2^2(^ + 1)^/1 + |r| + sy/\v\ . (13) 

Moreover, using 

s m +ir m - s m r m+ i = {-l) m+l v (14) 

and from (11), (13) we deduce 

\s m+ \r m \ < \v\ + |s m r m+ i| < \v\ + 2\/2(A + l)\u\ . 
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In addition, by Lemma 6 we must have 

|s m +i| > 

which therefore implies that 



y/l + \r\ +s 



\r m \ < (2V2 (A + 1) + 1) y/i + \ r \+s 

This last equation, together with (13) and (11), show that the two vectors v\ = (r m ,— s m ),V2 = 
(fm+i, —Sm+i) S kerF C Z[i] 2 have rectangle norms bounded by C\f\u\ = Cn 1 ^, for C = 
(2y/2(A+ 1) + 1)^1 + \r\ + s < 51.5 y/l + \r\ + s (that these two vectors belong to kerF was 
shown in the proof of Lemma 6). 

We can find two more vectors by noticing that (14) implies that v\ and vi are Q(i)-linearly 
independent. Therefore, the vectors vi,V2,v$ = ivi,V4 = ii>2 are Q-linearly independent. They all 
belong to kerF and have rectangle norms bounded by Cn 1 / 4 . In view of the fact that the Euclidean 
norm upper-bounds the rectangle norm, the corresponding vectors in Z 4 also have rectangle norms 
bounded by Cn 1 / 4 , thus concluding the proof of the theorem, since these are exactly the four vectors 
returned by Algorithm 3. □ 

Remark 6. Let us note that since we are in dimension less than 5, Nguyen and Stehle [11] have 
also produced and algorithm which finds vectors of successive minima in ker F with a running time 
0(log 2 n). However it doesn't seem to give an explicit bound on their length applicable to our case. 

Remark 7. One may doubt about the pertinence of securing a faster lattice reduction algorithm 
for the GLV lattice kerF. Indeed, at the present moment, the GLV method has been applied by 
choosing a fixed curve in the parameters and performing the lattice reduction offline. However, it is 
quite possible that in the future some new cryptosystem will require an online curve agreement, by 
counting points over a suitable field and successively performing the lattice reduction. For this, and 
human tendency of pushing away limitations, we consider that our previous argument, in addition 
to its formal elegance, may eventually find a useful application. 



8 Performance Estimates 



In this section, we assess performance of the four-dimensional GLV method in comparison with the 
traditional and two-dimensional cases. For our analysis, let us consider the curve E : y 2 = x 3 + b 
over a quadratic extension field of large prime characteristic, exploiting a pseudo-Mersenne prime p 
such that —1 is a quadratic non-residue mod p, for efficiency purposes. Let us define the following 
notation: i) M, S, A and / represent field multiplication, squaring, addition and inversion over ¥ p , 
respectively, and ii) m,s,a and i represent the same operations over ¥ p 2. On the curve above, the 
expected costs of scalar multiplication at the 128-bit security level in terms of ¥ p 2 operations on 
one processor core are given by 

- One core non-GLV: 256DBL + 42.5mADD + Cp recomp + C J 4 // i„ e = 1108m + 1152s + 2090a + (li + 
64m + 19s + 56a) + (li + 3m + Is) = 2i + 1175m + 1172s + 2146a, 

- One core 2-GLV: 128DBL + 43.5mADD + C Precomp + C A ffine = 732m + 643s + 1201a + (li + 
78m + 19s + 63a) + (li + 3m + Is) = 2i + 813m + 663s + 1264a, 
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- One core 4-GLV: 64DBL + 45.5mADD + Cp recom j, + C A f/ me = 556m + 393s + 767a + (H+ 106m + 
19s + 77a) + (It + 3m + Is) = 2i + 665m + 413s + 844a, 

corresponding to the fully sequential executions without using GLV, using 2-GLV and using 4-GLV, 
respectively. Note that DBL and mADD represent point doubling and mixed addition, and that 
their costs are 3m + 4s + 7a and 8m + 3s + 7a when using Jacobian coordinates. Cp recomp and 
CAffine represent the cost of precomputation and final conversion to affine coordinates, respectively. 
The costs above assume the use of interleaving (INT) [3] with width- u; non-adjacent form (wNAF) 
using 7 precomputed points (w = 5) and the use of the LM scheme for precomputing those points 
[8]. 

Similarly, the expected cost of 4-GLV on four cores is 

- Four core 4-GLV: 64DBL + 12.5mADD + C Precomp + C A ffine = 292m + 294s + 536a + (li + 69m + 
19s + 58a) + (li + 3m + Is) = 2i + 364m + 314s + 594a. 

Thus, it can be seen a steady cost reduction when switching from a non-GLV based implemen- 
tation to 2- and 4-GLV. A similar improvement is observed when increasing the number of cores in 
the case of 4-GLV. Optimal performance is ultimately achieved with this method running on four 
cores. For instance, following implementation results on an AMD Phenom II X4 940, we have that 
li « 50m, Is « 0.70m and la « 0.2m when using a 127-bit prime. In this case, non-GLV, 2-GLV 
and 4-GLV cost 2525m, 1630m and 1223m on one core, respectively. On the other hand, four core 
4-GLV costs 803m. Hence, a remarkable 3x speedup is expected when using 4-GLV on four cores 
in comparison with a traditional execution on one core. 

Note that, in comparison to F 2 arithmetic using a 128- bit prime (such as the one used in [4]), ¥ 2 
multiplication is expected to be faster since internal field additions can be carried out without carry 
checks and lazy reduction applies efficiently (see also [7]); however, other operations get slightly 
more expensive because reduction involves a few extra shift and rotate operations. 

Let us compare performance against a similar curve using the original GLV method over F p . 
In this case the expected costs when using one and two cores are (1/ « 200M, 15 « 0.85M and 
1A 0.2M on the targeted platform) 

- One core standard 2-GLV: 128DBL + 37.2ADD + 5.3mADD + C Precomp + C A ffine = 836M + 
6405 + 1194,4 + (51M + 265 + 56,4) + (1/ + 3M + 15) = 11 + 890M + 6675 + 1250,4 = 1907M, 

- Two core standard 2-GLV: 128DBL + 19ADD + 3.5mADD + Cp recomj) + C j4 //m e = 621M + 5805 + 
1054,4 + (44M + 265 + 56,4) + (1/ + 3M + 15) = 11 + 668M + 6075 + 1110,4 = 1606M, 

where DBL, mADD and ADD cost 3M + 45 + 7 A, 8M + 35 + 7 A and 11M + 35 + 7 A for the 
case of Jacobian coordinates. We assume the use of uN AF with window width w = 5 and the LM 
precomputation scheme without inversions [6, Ch. 3]. Since in our case we have observed that in 
practice 1M ~ 0.75m, the scaled costs of one core and two core standard 2-GLV are equivalent to 
1430m and 1205m, respectively. This means that on one core the 4-GLV method is expected to 
compute scalar multiplication in about 0.86 the time of the standard 2-GLV. Similarly, an optimal 
execution of 4-GLV on four cores is expected to run in about 0.67 the time of the optimal execution 
of the standard 2-GLV on two cores. 

To confirm our findings we implemented the proposed method using the quadratic twist of E\ 
over F p 2 given by E[/F p 2 : y 2 = x 3 + u.9, where Ei/¥ pi : y 2 = x 3 + 9, p\ = 2 127 - 58309, u is 
a non-square in F 2 and #E[(F 2) is a 254-bit prime. Since pi = 3 (mod 8), we represent F 2 as 
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Table 1. Point multiplication timings (in clock cycles), 64-bit processor 



Method 


# of cores 


AMD Phenom II 


E[(W p 2) 127-bit p, 4GLV+INT, 7pts. 
E[(¥ p 2) 127-bit p, 4GLV+INT, 7pts. 
E[(F p l) 127-bit p, wNAF, 7pts. 
E 2 {¥p 2 ) 256-bit p, 2GLV+INT, 7pts. 


4 
1 
1 

2 


91,000 
124,000 
248,000 
136,000 



¥ pi [i], where i = Let u = 1 + i. The two endomorphisms are given by <P(x, y) = (£x, y) = \P 

and \P(x,y) = (u^ p ^ 3 x, u^ l ^ p ^ 2 y) = fiP, where £ 3 = 1 mod pi. Following Section 5 it can be 
verified that <P 2 + <I> + 1 = and •P' 2 + 1 = 0. Note that a similar curve was also used in [4] 
but using a different 4-GLV construction. For the case of the standard 2-GLV, we use the curve 
E2/¥ P2 ■ y 2 = x 3 + 2, where p2 = 2 256 — 11733 and #E' 2 (F P2 ) is a 256-bit prime. The endomorphism 
is given by y). 

For our experiments we used a 3.0GHz AMD Phenom II X4 940 processor with four cores. The 
expected timings in terms of clock cycles are displayed in Table 1. As can be seen, closely following 
our analysis and considering that the use of multiple cores inserts certain penalty, 4-GLV on four 
cores injects a speed up close to 3x in comparison with a fully sequential version on one core, and 
supports a computation that runs in 0.67 the time of the standard 2-GLV on two cores. 

Although these experimental results correspond to a j = curve, we can confidently express 
that the relative improvement from 2-GLV to 4-GLV on any single GLV curve will be by the same 
order. This follows theoretically from the minimality of Theorem 1 and the form of Theorem 3, 
which together imply that gg£gtgff£ggr * 1/2 (n ° ^ S 

9 Conclusion 

We have produced new families of GLV curves, and written all such curves (up to isomorphism) 
with nontrivial endomorphisms of degree < 3. We have shown how to generalize the Gallant- 
Lambert- Vanstone scalar multiplication method by combining it with the Galbraith-Lin-Scott ideas, 
to perform a proven almost fourfold speedup on GLV curves over F p 2. We have provided a first 
explicit bound on such a decomposition using the LLL algorithm. We have then refined this bound 
using a faster new reduction algorithm, which consists in basically two applications of the extended 
Euclidean algorithm, one in Z and the other in Z[i]. This allows us to get a relative improvement 
(on the same curve) from 2-GLV to 4-GLV independent of the curve. 
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